Students Seminar

2016-11-17

Version control with Git

Andreas Karlsson
Contact: andreas.a.karlsson@ki.se

Show of hands

  • Who writes code for their PhD?
  • Who has considered, but not used version control (vc)?
  • Who is regularly using vc?
  • Who is using git?
  • Who is using svn?
  • Others?

Motivation

  • Why use vc?
    • reproducible research
  • Why use a software for vc?
    • a log book closely linked to your code
    • saves time in the long run
    • serenity now
  • Why use git?
    • no server needed
    • widely adopted
    • lots of linked services

Naming conventions

Local vs collaborative

  • local version control - really simple
    • opinion: worth while for most PhD-students
    • focus of this presentation
  • collaboration - requires communication
    • show some possibilities - no details
    • focus of other presentation

What does git do?

  • controls different versions of files
  • store changes not whole files
  • tracks the change's: order, time point and author
  • generally only one version in the working directory
  • "undo" for your project

Lingo

  • repository
  • commit
  • branch
  • HEAD

How to work with git

  • CLI - focus of examples today
  • Git-Gui - other presentation
  • Rstudio
  • Magit Emacs
  • Web

Three states of the files

Git from shell - getting started

Creating a file called myFile.R

## Smoking, Alcohol and (O)esophageal Cancer
fit <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp, data = esoph,
      family = binomial())

We create a repository…

git init
Initialized empty Git repository in /home/andkar/src/ki/Presentations/git/.git/

…and stage our file

git add myFile.R

Git from shell - committing

We check what's going on…

git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   myFile.R

…and commit our staged changes

git commit -m "Simple model of esophagel cancer"
[master b90f403] Simple model of esophagel cancer
 1 file changed, 3 insertions(+)

Git from shell - looking at diffs

Adding a few lines of code to our file

## The results from the logistic regression
summary(fit)

What has changed since last commit?

git diff
diff --git a/myFile.R b/myFile.R
index c388f49..4425392 100644
--- a/myFile.R
+++ b/myFile.R
@@ -1,3 +1,5 @@
 ## Smoking, Alcohol and (O)esophageal Cancer
   fit <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp, data = esoph,
         family = binomial())
+  ## The results from the logistic regression
+  summary(fit)

We stage & commit those changes

git commit myFile.R -m "Summarise regression results"
[master 1110cbb] Summarise regression results
 1 file changed, 2 insertions(+)

View the commit log

We have look at the commit history

git log
commit 1110cbbdfa5feb06d3942789a0e535cb7fe0143a
Author: andreasakarlsson <andreas.a.karlsson@gmail.com>
Date:   Wed Feb 10 16:01:38 2016 +0100

    Summarise regression results

commit b90f403615d50e4bcfdf8f87b890b32c2262a362
Author: andreasakarlsson <andreas.a.karlsson@gmail.com>
Date:   Wed Feb 10 15:58:28 2016 +0100

    Simple model of esophagel cancer

Roll back to the earlier commit

We decide to go back to previous version

git checkout b90f4

We look at the commit message and diff

git show
commit b90f403615d50e4bcfdf8f87b890b32c2262a362
Author: andreasakarlsson <andreas.a.karlsson@gmail.com>
Date:   Wed Feb 10 15:58:28 2016 +0100

    Simple model of esophagel cancer

diff --git a/myFile.R b/myFile.R
index e69de29..c388f49 100644
--- a/myFile.R
+++ b/myFile.R
@@ -0,0 +1,3 @@
+## Smoking, Alcohol and (O)esophageal Cancer
+  fit <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp, data = esoph,
+        family = binomial())

Roll back options

Command Scope Common use cases
git checkout Commit-level Switch between branches or inspect old snapshots
git checkout File-level Discard changes in the working directory
git reset Commit-level Discard commits in a private branch or throw away uncommited changes
git reset File-level Unstage a file
git revert Commit-level Undo commits in a public branch
git revert File-level (N/A)

Commits best practice

  • Your commit messages together with the diffs will become the history of your project - make it accessible for yourself and collaborators.
  • A diff will tell you what changed, but only the commit message can properly tell you why.

The seven rules of a great git commit

  1. Separate subject from body with a blank line
  2. Limit the subject line to 50 characters
  3. Capitalize the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why vs. how

Atomic commits

  • one task or one fix in each commit
  • easier to write a good commit message
  • easier to roll back a specific change

Cleaning up with .gitignore

  • Lists files you wish to hide
  • ignore automatically produced files e.g LaTeX's *.aux or *.toc
  • ignore uninteresting subdirectories e.g. **/log/
  • don't ignore this file e.g. !myFile.R

Tags

  • name important points in history
    • a version of your package e.g. submitted to CRAN
    • a version of you paper submitted to a journal
    • a version of your analysis used with a specific data set version

Tag examples

Create an annotated tag

git tag -a v1.4 -m "my version 1.4"

See data for a specific tag

git show v1.4

Collaboration - how I do it

Collaborate with Git

Creates a local version of your remote repository

git clone https://github.com/my_user/my_repository.git

Send local changes to your remote repository

git push

Download changes locally from your remote repository

git pull

Branches default naming

  • A branch represents an independent line of development
  • default name of your first local branch is master
  • default name of your first remote repository is origin
  • True for each repo e.g. on github there is a local master

My project history

PhD student perspective

  • One repo per study
  • One repo per R/Stata-package
  • Commit frequency
  • Quality of commits
  • MEB archiving policies
  • Tag - paper submission, sharing manuscript, when you get new data, the version of stats-software package etc.

Thank you for listening!

Acknowledgements: Alexander Ploner, Henric Winell & Robert Karlsson